Comparing common and rare single variant and gene aggregate instrumentation strategies for molecular MR

Author

Aimee Hanson

Published

September 29, 2025

Introduction

Classically, Mendelian Randomisation (MR) methods utilising trait-associated genetic variants from GWAS studies have employed common polymorphisms (population MAF > 1%) targeted by genotyping chips, or reliably imputed from reference populations, to instrument modifiable exposures. However, common variants tested in GWAS typically explain a very small fraction of the variability in a measured trait, potentially exhibit pleiotropic effects acted upon by balancing selection or as a consequent of genetic linkage, and are rarely causal. Rare variants, which typically show large biological effects (e.g. through abolishing protein expression) provide a means of more unambiguously instrumenting relevant molecular processes. Comparison of causal estimates derived using differing methods of genetically instrumenting modifiable exposures may enhance the interpretation of the biological mechanisms underlying exposure-outcome relationships. This includes using variants from across the allele frequency spectrum, but also leveraging rare variant aggregate approaches to instrument gene-level perturbations in expression and function.

Here, caual estimates for pairwises combinations of 2937 molecular exposures (O-link plasma proteins) and 11 health related compelex traits (below) have been derived using nine instrumentation strategies, with further instrument splitting into cis (variants within 1MB of the encoding gene start and end site), trans and full variant sets (a total of 27 instrument sets per exposure).

Outcomes: Low density lipoprotein levels (LDL), Body Mass Index (BMI), Vitamin D (VitD), Triglycerides (Trig), Glycated Haemoglobin (HbA1c), Mean Platelet Volume (PlatVol), IGF-1 (IGF1), Waist-to-Hip Ratio (WHRadjBMI), Red Blood Cell Erythrocyte Count (RBCCount), Mean Corpuscular Volume (CorpVol) and Systolic Blood Pressure (SBP)

Instrumental variable (IV) sets

Nine sets of IVs for each protein exposure have been extracted from across two studies:
GWAS common variants: “Plasma proteomic associations with genetics and health in the UK Biobank” (Sun, B. 2023, Nature).
WES variants/gene aggregate tests: “Rare variant associations with plasma protein levels in the UK Biobank” (Dhindsa, R. 2023, Nature).

Common GWAS
Common genome-wide fine-mapped variants from the GPMAP (>1% MAF) (cis, trans and combined)
Common genome-wide clumped and pruned variants (>1% MAF) (cis, trans and combined)

WES (variants)
Common exome-wide (>1% MAF) (cis, trans and combined)
Rare exome-wide (0-1% MAF) (cis, trans and combined)
Ultra-rare exome-wide (0-0.1% MAF) (cis, trans and combined)
LD clumping was performed for common variants (for which LD matrices have been derived) only. For rare variants, instrument sets were stored both pre and post filtering to the top associated variant per gene for sensitivity analyses.

WES (masks)
pLOF burden mask (cis, trans and combined)
missense/low-confidence burden mask (cis, trans and combined)
pLOF/missense/low-confidence burden mask (cis, trans and combined)
synonymous burden mask (cis, trans and combined)

Data import

Read in MR results from scripts/04_analysis_proteins/001_performmolecularMR.R and harmonised summary statistics:

Code
files <- list.files(res_dir, pattern = "results_molecular", full.names = T)

# Import MR results
res_MR <- list()
for(i in 1:length(files)){
  load(files[i])
  res_MR[[i]] <- results_MR
}
names(res_MR) <- sub(".*_MR_(.*)\\.rda","\\1", files)

# 0.05/2937 proteins tested per outcome
alpha <- signif(0.05/length(res_MR$BMI),2)

# IVsets
IVsets <- c("dhindsa_exwas_mask_ptv" = "mask:pLOF",
            "dhindsa_exwas_mask_raredmg" = "mask:missense",
            "dhindsa_exwas_mask_ptvraredmg" = "mask:pLOF|missense",
            "dhindsa_exwas_mask_syn" = "mask:synonymous",
            "dhindsa_exwas_variant_ultrarare" = "exome:ultrarare",
            "dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
            "dhindsa_exwas_variant_rare" = "exome:rare",
            "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
            "dhindsa_exwas_variant_common" = "exome:common (clumped)",
            "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
            "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

cols_IVsets <- c("darkorchid4","darkorchid3","darkorchid2","orchid",
                 "steelblue4","steelblue4","steelblue1","steelblue1","darkorange2","orange","orange")
names(cols_IVsets) <- IVsets

# Variant-based instrument sets
variant_IVsets <- c("dhindsa_exwas_variant_ultrarare" = "exome:ultrarare",
                    "dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
                    "dhindsa_exwas_variant_rare" = "exome:rare",
                    "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
                    "dhindsa_exwas_variant_common" = "exome:common (clumped)",
                    "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
                    "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

cols_variant_IVsets <- c("steelblue4","steelblue4","steelblue1","steelblue1","darkorange2","orange","orange")
names(cols_variant_IVsets) <- variant_IVsets

# Filter variant-based instruments 
variant_IVsets_indep <- c("dhindsa_exwas_variant_ultrarare_filt" = "exome:ultrarare (filtered)",
                          "dhindsa_exwas_variant_rare_filt" = "exome:rare (filtered)",
                          "dhindsa_exwas_variant_common" = "exome:common (clumped)",
                          "sun_gwas_variant_common_clumped" = "genome:common (clumped)",
                          "sun_gwas_variant_common_finemapped" = "genome:common (fmapped)")

Comparison of IV sets

The average number of genetic instruments (pQTLs) per protein across instrument sets

Rare and common variant instruments (pQTLs) have been partitioned according to their proximity to the instrumented protein-coding gene into cis (+- 1MB from gene boundaries) and trans (>1MB distant) sets. There is a higher average number of exonic rare and ultra-rare variant instruments per protein derived from WES data than common GWAS derived instruments. Proteins that are instrumented with >100 rare cis variants fall in gene dense genomic regions. Available IVs drop when restricting to one rare variant per annotated gene, but rare IVs have not been explicitly LD pruned so it is possible that in some instances these variants are not independent. More common variant trans-IVs can be derived from the WES over the GWAS datasets (with clumping and pruning), potentially due to the higher coverage of coding regions involved in trans-gene regulation (via modification of pathwaway effects) by exome sequencing data.

Code
# Number of instruments per protein for each IV set
num_IVs <- lapply(res_MR, function(x){
  df_out <- data.frame(expand.grid(IV_set = IV_sets, cis_trans = c("cis","trans")))
  
  for(i in 1:length(x)){
    df <- x[[i]] |> dplyr::summarise(unique(nsnp), .by = c(IV_set, cis_trans))
    colnames(df)[3] <- names(x[i])
    df_out <- left_join(df_out, df, by = c("IV_set","cis_trans"))
  }
  
  df_out <- df_out |> tidyr::pivot_longer(cols = 3:length(df_out)) |> na.exclude()
  return(df_out)
})

num_IVs <- do.call("rbind", num_IVs) |> dplyr::mutate(outcome = sub(".*_","",name))

# Plot
p1 <- mapply(FUN = function(x,y){
  x <- x |> dplyr::filter(cis_trans == y) |>
    dplyr::mutate(label = factor(IVsets[IV_set], levels = IVsets)) |>
    dplyr::filter(label %in% grep("exome|genome", IVsets, value = T))
  
  ggplot(x, aes(x = label, y = value, fill = label)) +
    facet_grid(.~outcome) +
    stat_boxplot(geom ='errorbar', width = 0.2) + 
    geom_boxplot(width = 0.7) +
    scale_y_log10() +
    scale_fill_manual(values = cols_IVsets) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = "none",
          plot.title = element_text(hjust = 0.5)) +
    labs(x = "IV set",
         y = "Num. instruments", 
         title = paste0("Number of ",y ," instruments per plasma protein"))
  
}, x = list(num_IVs,num_IVs), y = c("cis","trans"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p1)
Figure 1

The number of proteins able to be instrumented with pQTLs across instrument sets

Nearing 100% of O-link plasma proteins can be instrumented with either rare or common trans-IVs derived from WES data relative to <75% when using GWAS derived pQTLs. Instrumenting proteins using rare cis-IVs from WES provides greater coverage of the plasma proteome than using common GWAS derived pQTLs, though common GWAS pQTLs provide better coverage than common WES derived pQTLs, suggesting cis-regulatory variants outside the coding region have significant effects on protein levels. Here, only instrument sets including >=3 independent variants are considered.

Code
# Number of proteins with significant causal estimates across instrument sets
summary_sig <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3)
summary_sig_cis <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3, cis_trans = "cis")
summary_sig_trans <- lapply(res_MR, summarise_results, p_thresh = alpha, n_snps = 3, cis_trans = "trans")

# Number of proteins with instrument sets (of >= n_snps) available
avail_IVs <- do.call("rbind",lapply(summary_sig, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))
avail_IVs_cis <- do.call("rbind",lapply(summary_sig_cis, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))
avail_IVs_trans <- do.call("rbind",lapply(summary_sig_trans, function(x){apply(x$min_p, MARGIN = 2, function(y){sum(!is.na(y))})/nrow(x$min_p) * 100}))

# Plot
p2 <- mapply(FUN = function(x,y){
  x[,names(variant_IVsets)] |> 
    as.data.frame() |> 
    tibble::rownames_to_column("outcome") |>
    tidyr::pivot_longer(cols = 2:8) |>
    dplyr::mutate(label = factor(variant_IVsets[name], levels = variant_IVsets)) |>
    ggplot(aes(x = label, y = value, fill = label)) +
    facet_grid(.~outcome) +
    geom_col() +
    scale_fill_manual(name = "", values = cols_variant_IVsets) +
    scale_y_continuous(limits = c(0,100)) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.position = "none") +
    labs(x = "IV set",
         y = "%", 
         title = paste0("Percentage of plasma proteins with ",y, "IV sets available (with n>=3 instruments)"))
}, x = list(avail_IVs, avail_IVs_cis, avail_IVs_trans), y = c("", "cis-", "trans-"), SIMPLIFY = FALSE)
  
gridExtra::grid.arrange(grobs = p2)
Figure 2

Proportion of instrumented proteins with significant IVW causal effects detected across instrument sets

Restricting to only those proteins that could be successfully instrumented (with n >= 3 IVs) by all strategies shown below, a significant causal effect of protein levels on a given outcome tend to be more often detected using common cis-IVs from GWAS than rare and ultra-rare cis-IVs. Here, significance has been defined at two thresholds, the liberal threshold of p<=0.05 and the conservative Bonferroni adjusted significance threshold given n=2937 independent proteins tested (p<=1.7x10^-5).

Code
# Calculate the proportion of significant exposures across variant sets
# Keep proteins instrumentable by all variant-based IV sets
prop_sig_IVs <- function(res_summary, p_thresh){
  
  df_p <- res_summary$min_p[,names(variant_IVsets_indep)] |>
    na.exclude() |>
    as.data.frame() |>
    tibble::rownames_to_column("row") |>
    dplyr::mutate(gene = sub("_.*","",row),
                  outcome = sub(".*_","",row)) |>
    tidyr::pivot_longer(2:6, names_to = "IV_set", values_to = "pval")
  
  df_beta <- res_summary$min_p_beta[,names(variant_IVsets_indep)] |>
    na.exclude() |>
    as.data.frame() |>
    tibble::rownames_to_column("row") |>
    dplyr::mutate(gene = sub("_.*","",row),
                  outcome = sub(".*_","",row)) |>
    tidyr::pivot_longer(2:6, names_to = "IV_set", values_to = "beta")
  
  df <- left_join(df_p, df_beta) |>
    dplyr::mutate(call = ifelse(pval <= p_thresh & beta > 0, "sig_up",
                                ifelse(pval <= p_thresh & beta < 0, "sig_down", "ns")))
  
  df_grouped <- df |> 
    dplyr::group_by(IV_set,call) |>
    dplyr::summarise(n_genes = n()) |>
    dplyr::mutate(prop = n_genes/sum(n_genes) * 100,
                  outcome = unique(df$outcome),
                  n_genes_tot = sum(n_genes))
  
  return(df_grouped)
}

sig_IVs <- do.call("rbind", lapply(summary_sig, prop_sig_IVs, p_thresh = alpha))
sig_IVs_cis <- do.call("rbind", lapply(summary_sig_cis, prop_sig_IVs, p_thresh = alpha))
sig_IVs_trans <- do.call("rbind", lapply(summary_sig_trans, prop_sig_IVs, p_thresh = alpha))

sig_IVs_0.01 <- do.call("rbind", lapply(summary_sig, prop_sig_IVs, p_thresh = 0.01))
sig_IVs_0.01_cis <- do.call("rbind", lapply(summary_sig_cis, prop_sig_IVs, p_thresh = 0.01))
sig_IVs_0.01_trans <- do.call("rbind", lapply(summary_sig_trans, prop_sig_IVs, p_thresh = 0.01))

# Plot
p3 <- mapply(function(x,y){
  x |> dplyr::filter(call %in% c("sig_up","sig_down")) |>
  dplyr::mutate(label = factor(variant_IVsets_indep[IV_set], levels = variant_IVsets_indep)) |>
  ggplot(aes(x = label, y = prop, fill = call)) +
  facet_grid(.~outcome) +
  geom_bar(position = "stack", stat = "identity") +
  scale_fill_manual(name = "Direction of effect", values = c("steelblue","tomato"), label = c("down","up")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        plot.margin = margin(t = 5, r = 5, b = 5, l = 30)) +
  labs(x = "IV set",
       y = "%", 
       title = paste0("Percentage of plasma protein causally associated with outcome using ", y, "IVs (at P<=0.01)"))
}, x = list(sig_IVs_0.01, sig_IVs_0.01_cis, sig_IVs_0.01_trans), y = c("all ","cis-","trans-"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p3)

p4 <- mapply(function(x,y){
  x |> dplyr::filter(call %in% c("sig_up","sig_down")) |>
  dplyr::mutate(label = factor(variant_IVsets_indep[IV_set], levels = variant_IVsets_indep)) |>
  ggplot(aes(x = label, y = prop, fill = call)) +
  facet_grid(.~outcome) +
  geom_bar(position = "stack", stat = "identity") +
  scale_fill_manual(name = "Direction of effect", values = c("steelblue","tomato"), label = c("down","up")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        plot.margin = margin(t = 5, r = 5, b = 5, l = 30)) +
  labs(x = "IV set",
       y = "%", 
       title = paste0("Percentage of plasma protein causally associated with outcome using ", y, "IVs (at P<=1.7x10^-5)"))
}, x = list(sig_IVs, sig_IVs_cis, sig_IVs_trans), y = c("all ","cis-","trans-"), SIMPLIFY = FALSE)

gridExtra::grid.arrange(grobs = p4)
Figure 3
Figure 4

Comparison of IVW effect estimates across instrument sets

Code
# Functions

# Generate a table of IVW estimates for a pair of instrument sets
compare_estimates <- function(results_MR, IVset1, IVset2, n_snps=3, cis_trans="all"){
  # Pull out IVW estimates and SE for instrument sets
  if(cis_trans == "all"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "cis_trans",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  if(cis_trans == "cis"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "cis",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  if(cis_trans == "trans"){
    df <- lapply(results_MR, function(x){
      x |> dplyr::filter(method == "Inverse variance weighted", 
                         cis_trans == "trans",
                         IV_set %in% c(IVset1,IVset2))
    })
  }
  df_out_b <- do.call("rbind",df) |> 
    dplyr::filter(nsnp >= n_snps) |>
    tidyr::pivot_wider(names_from = "IV_set", values_from = "b", id_cols = "exposure") |>
    na.exclude() |>
    dplyr::relocate(exposure,IVset1,IVset2)
  
  df_out_se <- do.call("rbind",df) |> 
    dplyr::filter(nsnp >= n_snps) |>
    tidyr::pivot_wider(names_from = "IV_set", values_from = "se", id_cols = "exposure") |>
    na.exclude() |>
    dplyr::relocate(exposure,IVset1,IVset2)
  
  ## Caluclate effect heterogeneity
  if (!(identical(df_out_b$exposure, df_out_se$exposure) &&
        identical(colnames(df_out_b), colnames(df_out_se)))) {
    message("Error - mismatch of beta and SE dataframes")
    return(NULL)
  }
  
  beta_vecs <- lapply(split(df_out_b[,c(2,3)], 1:nrow(df_out_b)), as.numeric)
  se_vecs <- lapply(split(df_out_se[,c(2,3)], 1:nrow(df_out_se)), as.numeric)
  
  het_res <- do.call("rbind", mapply(effect_heterogeneity, beta_vec = beta_vecs, se_vec = se_vecs, SIMPLIFY = F)) |>
    as.data.frame() |>
    dplyr::mutate(exposure = df_out_b$exposure, .before = "Q") |>
    dplyr::mutate(Qpval_bon = p.adjust(Qpval, method = "bonferroni"),
                  Qpval_fdr = p.adjust(Qpval, method = "fdr"))

  df_out <- dplyr::left_join(df_out_b, df_out_se, by = "exposure", suffix = c(".beta",".se")) |>
    dplyr::left_join(het_res, by = "exposure")
  
  return(df_out)
}

# Generate a table of cis vs trans IVW estimates for given instrument set
compare_cistrans <- function(results_MR, IVset, n_snps=3){
  # Pull out IVW estimates and SE for instrument sets
  df <- lapply(results_MR, function(x){
    x |> 
      dplyr::filter(method == "Inverse variance weighted",
                    IV_set %in% IVset,
                    nsnp >= n_snps,
                    cis_trans %in% c("cis","trans")) |>
      dplyr::mutate(IV_set = paste(IV_set, cis_trans, sep = "_")) |>
      dplyr::arrange(cis_trans)})
  df <- df[lapply(df, nrow) == 2]
  
  df_out_b <- do.call("rbind",df) |> 
    tidyr::pivot_wider(names_from = "IV_set", values_from = "b", id_cols = "exposure") |>
    na.exclude()
  
  df_out_se <- do.call("rbind",df) |> 
    tidyr::pivot_wider(names_from = "IV_set", values_from = "se", id_cols = "exposure") |>
    na.exclude()
  
  ## Caluclate effect heterogeneity
  if (!(identical(df_out_b$exposure, df_out_se$exposure) &&
        identical(colnames(df_out_b), colnames(df_out_se)))) {
    message("Error - mismatch of beta and SE dataframes")
    return(NULL)
  }
  
  beta_vecs <- lapply(split(df_out_b[,c(2,3)], 1:nrow(df_out_b)), as.numeric)
  se_vecs <- lapply(split(df_out_se[,c(2,3)], 1:nrow(df_out_se)), as.numeric)
  
  het_res <- do.call("rbind", mapply(effect_heterogeneity, beta_vec = beta_vecs, se_vec = se_vecs, SIMPLIFY = F)) |>
    as.data.frame() |>
    dplyr::mutate(exposure = df_out_b$exposure, .before = "Q") |>
    dplyr::mutate(Qpval_bon = p.adjust(Qpval, method = "bonferroni"),
                  Qpval_fdr = p.adjust(Qpval, method = "fdr"))

  df_out <- dplyr::left_join(df_out_b, df_out_se, by = "exposure", suffix = c(".beta",".se")) |>
    dplyr::left_join(het_res, by = "exposure")
  
  return(df_out)
}

plot_estimates <- function(estimates, name){
  estimates <- as.data.frame(estimates)
  col_names <- names(estimates)
  
  if(any(grepl("cis|trans",col_names))){
    lab_x <- "cis-IVs"
    lab_y <- "trans-IVs"
  } else {
    lab_x <- IVsets[sub("\\.beta","",col_names[2])]
    lab_y <- IVsets[sub("\\.beta","",col_names[3])]
  }
  
  CI_x <- data.frame(LCI = estimates[,2] - 1.96*estimates[,4], UCI = estimates[,2] + 1.96*estimates[,4])
  CI_y <- data.frame(LCI = estimates[,3] - 1.96*estimates[,5], UCI = estimates[,3] + 1.96*estimates[,5])
  
  mod_lm <- lm(estimates[,3] ~ estimates[,2])
  slope <- round(coefficients(mod_lm)[2],2)
  
  flag_point <- estimates$Qpval_bon <= 0.05
  
  range_min <- min(min(estimates[,2]), min(estimates[,3])) #min(na.exclude(errorbar_xmin, errorbar_ymin)))
  range_max <- max(max(estimates[,2]), max(estimates[,3])) #min(na.exclude(errorbar_xmax, errorbar_ymax)))
  
  estimates_plot <- estimates |>
    dplyr::mutate(exposure = factor(exposure, levels = estimates$exposure[order(estimates$Qpval, decreasing = TRUE)]),
                  errorbar_xmin = ifelse(flag_point, CI_x$LCI, NA),
                  errorbar_xmax = ifelse(flag_point, CI_x$UCI, NA),
                  errorbar_ymin = ifelse(flag_point, CI_y$LCI, NA),
                  errorbar_ymax = ifelse(flag_point, CI_y$UCI, NA)) |>
    dplyr::arrange(exposure)
  
  estimates_plot |>
    ggplot(aes_string(x = col_names[2], y = col_names[3])) +
    geom_abline(slope = 1, intercept = 0, colour = "steelblue", lty = 2, lwd = 0.5) +
    geom_point(aes(colour = estimates_plot$Qpval_bon <= 0.05), size = 2, shape = 16) +  
    geom_errorbar(xmin = estimates_plot$errorbar_xmin,
                  xmax = estimates_plot$errorbar_xmax, colour = "tomato") +
    geom_errorbar(ymin = estimates_plot$errorbar_ymin,
                  ymax = estimates_plot$errorbar_ymax, colour = "tomato") +
    geom_smooth(method = "lm", se = FALSE, colour = "tomato", lwd = 0.5) +
    annotate(geom = "text", x = range_min/2, y = range_max, label = paste0("Slope=",slope), size = 3, colour = "tomato") +
    scale_colour_manual(values = c("black","tomato")) +
    scale_x_continuous(limits = c(range_min, range_max)) +
    scale_y_continuous(limits = c(range_min, range_max)) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5, size = 10), legend.position = "none") +
    labs(x=lab_x, y=lab_y, title = name)
}

1a) Common variant cis-IVs (finemapped vs clumped)
1b) Common variant trans-IVs (finemapped vs clumped)

2a) Common variant cis-IVs (GWAS vs ExWAS)
2b) Common variant trans-IVs (GWAS vs ExWAS)

Code
comp_IVWs_common_1a <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "sun_gwas_variant_common_clumped",
                    IVset2 = "sun_gwas_variant_common_finemapped",
                    n_snps = 3, cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_1a, name = names(comp_IVWs_common_1a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from common cis-pQTL instruments (finemapped vs clumped)")

comp_IVWs_common_1b <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "sun_gwas_variant_common_clumped",
                    IVset2 = "sun_gwas_variant_common_finemapped",
                    n_snps = 3, cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_1b, name = names(comp_IVWs_common_1b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from common trans-pQTL instruments (finemapped vs clumped)")

comp_IVWs_common_2a <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "sun_gwas_variant_common_clumped",
                    n_snps = 3, cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_2a, name = names(comp_IVWs_common_2a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES common cis-pQTL instrumentation strategies")

comp_IVWs_common_2b <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "sun_gwas_variant_common_clumped",
                    n_snps = 3, cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_common_2b, name = names(comp_IVWs_common_2b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and WES common trans-pQTL instrumentation strategies")
Figure 5
Figure 6
Figure 7
Figure 8

3a) Common vs rare variant cis-IVs (ExWAS)
3b) Common vs rare variant trans-IVs (ExWAS)

4a) Common vs ultra-rare variant cis-IVs (ExWAS)
4b) Common vs ultra-rare variant trans-IVs (ExWAS)

Code
comp_IVWs_3a <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "dhindsa_exwas_variant_rare",
                    n_snps = 3,
                    cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_3a, name = names(comp_IVWs_3a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and rare cis-pQTL instrumentation strategies")

comp_IVWs_3b <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "dhindsa_exwas_variant_rare",
                    n_snps = 3,
                    cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_3b, name = names(comp_IVWs_3b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and rare trans-pQTL instrumentation strategies")

comp_IVWs_4a <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "dhindsa_exwas_variant_ultrarare",
                    n_snps = 3,
                    cis_trans = "cis")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_4a, name = names(comp_IVWs_4a), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and ultra-rare cis-pQTL instrumentation strategies")

comp_IVWs_4b <- lapply(res_MR, FUN = compare_estimates, 
                    IVset1 = "dhindsa_exwas_variant_common",
                    IVset2 = "dhindsa_exwas_variant_ultrarare",
                    n_snps = 3,
                    cis_trans = "trans")

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_4b, name = names(comp_IVWs_4b), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from WES common and ultra-rare trans-pQTL instrumentation strategies")

# Extract list of genes with significant heterogenity between common and rare/ultrarare variant estimates (cis or trans instruments)
het_genes <- lapply(
  Map(c, lapply(comp_IVWs_3a, function(x){x |> dplyr::filter(Qpval_bon <= 0.05) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_3b, function(x){x |> dplyr::filter(Qpval_bon <= 0.05) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_4a, function(x){x |> dplyr::filter(Qpval_bon <= 0.05) |> dplyr::pull(exposure)}),
      lapply(comp_IVWs_4b, function(x){x |> dplyr::filter(Qpval_bon <= 0.05) |> dplyr::pull(exposure)})), function(x){unique(x)})

het_genes <- mapply(FUN = function(x,y){paste(x,y,sep = "_")}, x = het_genes, y = names(het_genes), SIMPLIFY = F)
Figure 9
Figure 10
Figure 11
Figure 12
  1. Common cis-IVs vs trans-IVs (GWAS)
  2. Common cis-IVs vs trans-IVs (ExWAS)
  3. Rare cis-IVs vs trans-IVs (ExWAS)
  4. Ultra-rare cis-IVs vs trans-IVs
Code
comp_IVWs_5 <- lapply(res_MR, FUN = compare_cistrans, 
                    IVset = "sun_gwas_variant_common_clumped",
                    n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_5, name = names(comp_IVWs_5), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from GWAS common and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_6 <- lapply(res_MR, FUN = compare_cistrans, 
                    IVset = "dhindsa_exwas_variant_common",
                    n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_6, name = names(comp_IVWs_6), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS common and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_7 <- lapply(res_MR, FUN = compare_cistrans, 
                    IVset = "dhindsa_exwas_variant_rare",
                    n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_7, name = names(comp_IVWs_7), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS rare and cis- and trans-pQTL instrumentation strategies")

comp_IVWs_8 <- lapply(res_MR, FUN = compare_cistrans, 
                    IVset = "dhindsa_exwas_variant_ultrarare",
                    n_snps = 3)

gridExtra::grid.arrange(grobs = mapply(plot_estimates, estimates = comp_IVWs_8, name = names(comp_IVWs_7), SIMPLIFY = FALSE), nrow = 2,
                        top = "Comparison of IVW estimates from ExWAS ultra-rare and cis- and trans-pQTL instrumentation strategies")
Figure 13
Figure 14
Figure 15
Figure 16

Heterogenity in pQTL effects

Restricting to those protein exposures that show signficiant causal effects on a given outcome (at p <= 1.7^{-5} with >= 5 IVs), and significant heterogenity between rare and common variant IV sets (at P(FDR) < 0.05):

Code
# Retain exposure-outcome pairs with significant IVW estimate for at least one instrumentation strategy (with n instruments >= n_snps)
res_MR_top <- lapply(res_MR, filter_results, p_thresh = alpha, n_snps = 5)

# Filter to those with evidence of estimate heterogenity between variant sets
res_MR_top_het <- mapply(function(x,y){x[names(x) %in% y]}, x = res_MR_top, y = het_genes, SIMPLIFY = F)
Code
# Plot estimates
res_IVW <- lapply(res_MR_top_het, function(x){
  df <- do.call("rbind", x) |> 
    filter(method == "Inverse variance weighted",
           grepl("variant",IV_set),
           cis_trans %in% c("cis","trans")) |>
    dplyr::mutate(instrument_class = factor(variant_IVsets[IV_set], levels = variant_IVsets))
  return(df)})



plot_forest <- function(results_IVW){
  pd <- ggplot2::position_dodge2(width = 0.8, preserve = "single")
  exposure <- unique(sub(".*_", "", results_IVW$pair))
  
  n_row <- ceiling(length(unique(results_IVW$exposure))/4)
  
  p <- ggplot(results_IVW, aes(x = b, y = instrument_class, colour = instrument_class)) +
    facet_wrap(.~pair, ncol = 4, scale = "free_x") +
    geom_vline(xintercept = 0, colour = "grey20", lty = 2) +
    geom_point(aes(pch = cis_trans, size = nsnp), position = pd) +
    geom_errorbarh(aes(xmin = b-1.96*se, xmax = b+1.96*se, group = cis_trans), position = pd) +
    scale_colour_manual(values = cols_variant_IVsets) +
    scale_shape_manual(name = "", values = c(16,1)) +
    theme_bw() + guides(colour = "none") +
    theme(plot.title = element_text(hjust = 0.5)) +
    labs(x = "IVW", y = "", size = "n (instruments)", title = paste("Outcome:", exposure))

  return(p)
}

lapply(res_IVW, plot_forest)[c(1,3,6,9,11)]
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Code
lapply(res_IVW, plot_forest)[7]
Figure 22
Code
lapply(res_IVW, plot_forest)[c(2,4,5,8,10)]
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27

Next steps

  • Look into examples where:

    • common variants give highly significant causal estimates and rare variants don’t and vice versa
    • differences between GWAS (regulatory) and WES (coding) common variant IVW estimates
    • differences between cis and trans instrumentation
  • Some case studies with a more thorough interrogation of where the instruments are sitting and the biology they are instrumenting

  • Look at estimates from the gene-based masks

  • Pleiotropy:

    • Compare the pleiotropy profiles of rare and common (and cis and trans) instruments
  • Rare regulatory variants (required WGS data)